Empirical Analysis of the Effect of Dimension Reduction and Word Order on Semantic Vectors
نویسندگان
چکیده
The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decomposition (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.
منابع مشابه
Word clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering
The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...
متن کاملFeature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context
Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to ...
متن کاملEffect of Near-orthogonality on Random Indexing Based Extractive Text Summarization
Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality pr...
متن کاملThe Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)
The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...
متن کاملLinear transformations of semantic spaces for word-sense discrimination and collocation compositionality grading
Latent Semantic Analysis (LSA) and Word Space are two semantic models derived from the vector space model of distributional semantics that have been used successfully in word-sense disambiguation and discrimination. LSA can represent word types and word tokens in context by means of a single matrix factorised by Singular Value Decomposition (SVD). Word Space is able to represent types via word ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Semantic Computing
دوره 6 شماره
صفحات -
تاریخ انتشار 2012